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ABSTRACT. Computing value of information (VOX) is a crucial task in various aspects of 
decision-making under uncertainty, such as in meta-reasoning for searcli; in selecting measurements 
to make, prior to choosing a course of action; and in managing the exploration vs. exploitation 
tradeoff. Since such applications typically require numerous VOI computations during a single run, 
it is essential that VOI be computed efficiently. We examine the issue of anytime estimation of VOI, 
^ as frequently it suffices to get a crude estimate of the VOI, thus saving considerable computational 

resources. As a case study, we examine VOI estimation in the measurement selection problem. 
^ Empirical evaluation of the proposed scheme in this domain shows that computational resources 

can indeed be significantly reduced, at little cost in expected rewards achieved in the overall decision 
problem. 

1 INTRODUCTION 

< 

Problems of decision-making under uncertainty frequently contain cases where information can be 
I ^1 obtained using some costly actions, called measurement actions. In order to act rationally in the 

decision-theoretic sense, measurement plans are typically optimized based on some form of value 
of information (VOI). Computing VOI can also be computationally intensive. Since frequently an 
exact VOI is not needed in order to proceed (e.g. it is sufficient to determine that the VOI of a 
•O certain measurement is much lower than that of another measurement, at a certain point in time), 

significant computational resources can be saved by controlling the resources used for estimating 
the VOL This paper examines this tradeoff via a case study of measurement selection. 

In general, computation of value of information (VOI), even under the commonly used 
simplifying myopic assumption, involves multidimensional integration of a general function 
iRussell and Wefald, 1991] . For some problems, the integral can be computed efficiently 



o 



5^ nificant [Heckerman et al., 1993 [Bilgic and Getoor, 2007] and must be taken into account while 



. ^ [Russell and Wefald, 1989] ; but when the utility function is computationally intensive or when 

^ a non-myopic estimate is used, the time required to compute the value of information can be sig- 



computing the net value of information. This paper presents and analyzes an extension of the 
known greedy algorithm that decides when to recompute VOI of each of the measurements based 
on the principles of limited rationality [Russell and Wefald, 199T] . 

Although it may be possible to use this idea in more general settings, this paper 
mainly examines on-line most informative measurement selection [Krause and Guestrin, 2007 



[Bilgic and Getoor, 2007] , an approach which is commonly used to solve problems of optimiza- 
tion under uncertainty [Zheng et al., 2005] [Krause et al., 2008 . Since this approach assumes that 



the computation time required to select the most informative measurement is negligible compared 
to the measurement time jRussell and Wefald, 1991 , it is important in this setting to ascertain that 



VOI estimation indeed does not consume excessive computational resources. 



1 



URPDM2010 

2 THE MEASUREMENT SELECTION PROBLEM 

As our case study, we examine the following optimization problem. Given: 

• A set of Ns items S = {si, S2, ■ ■ ■ , sn^}- 

• A set of Nf item features Z = {zi, Z2, ■ ■ ■ , z^f}- (Each feature Zi has a domain 'D(zi).) 

• A joint distribution over the features of the items in S. That is, a joint distribution over the 
random variables {zi{si), Z2{zi), . . . , 2:1(52), ^2(52), • • •}• 

• A set of measurement types M = {{c,p)k \ k G l..Nm}, with potentially different intrinsic 
measurement cost c and observation distribution p, conditional on the true feature values, for 
each measurement type. 

• A utility function u{z) : M^-f — t- M on features. In the simplest case, there is just one real- 
valued feature, acting as the item's utility value, and u is simply the identity function. 

• A measurement budget C. 

Find a policy of measurement decisions and a final selection that maximize the expected net utility 
of the selection (the expected reward): 

max:i? = n(z(sQ,)) - ^Cfc- s.t. :^Cfc-<C (1) 

i=l i=l 

where Q = {{ki,Si) \ i £ l-.Nq} is the performed measurement sequence and Sa is the selected 
item. A next measurement is selected on-line, after the outcomes of all preceding measurements 
are known. 

The above selection problem is intractable, and is therefore commonly solved approximately using 
a greedy heuristic algorithm. The greedy algorithm selects a measurement frij^^^ with the greatest 
net value of information Vj^^^ ■ The net value of information is the difference between the intrinsic 
value of information and the measurement cost. 



Vj = A, - cfc^. (2) 

The intrinsic value of information A-,- is the expected difference in the true utility of the finally 
selected item Sq, after and before the measurement. 

A, =E(EKz(z„.))]-EKz(s„))]) (3) 

Exact computation of Aj is intractable, and various estimates are used, including the myopic 
estimate [Russell and Wefald, 1991] and semi- myopic schemes [Tolpin and Shimony, 2010] . 

The pseudocode for the algorithm is presented as Algorithm [T] At each step, the algorithm re- 
computes the value of information estimate of every measurement. The assumptions behind the 
greedy algorithm are justified when the cost of selecting a next measurement is negligible compared 
to the measurement cost. However, optimization problems with hundreds and thousands of items 
are common [Tolpin and Shimony, 20T0] ; and even if the value of information of a single measure- 
ment can be computed efficiently [Russell and Wefald, 1989 , the cost of estimating the value of 



information of all measurements becomes comparable to and outgrows the cost of performing a 
measurement. 

Recomputation of the value of information for every measurement is often unnecessary, especially 
when using the " blinkered" scheme [Tolpin and Shimony, 2010] , a greedy algorithm which attempts 
to also compute VOI for sequences of measurements of the same type. When there are many different 
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Algorithm 1 Greedy measurement selection 
1: budget ^ C 
2: Initialize beliefs 

3: loop 

4: for all items Si do 

5: Compute E(C/j) 

6: for all measurements rrij do 

7: if Cj < budget then 

8: Compute Vj 

9: else 

10: Vj ^ 

11: jmax ^ arg max 

12: if V,^^^ > then 

13: Perform measurement rrij^^^; Update beliefs; budget ^ budget — cj^ 

14: else 

15: break 

16: a ^ argmaxE(C/i) 
j 

17: return Sq, 



measurements, the value of information of most measurements is unlikely to change abruptly due 
to just one other measurement results. With an appropriate uncertainty model, it can be shown 
that the VOX of only a few of the measurements must be recomputed after each measurement, thus 
decreasing the computation time and ensuring that the greedy algorithm exhibits a more rational 
behavior w.r.t. computational resources. 

3 RATIONAL COMPUTATION OF VALUE OF INFORMATION 

For the selective VOI recomputation, the belief BEL(Aj) about the intrinsic value of information 
of measurement mj is modeled by a normal distribution with variance : 

BEL(A,)=AA(A„?|) (4) 



After a measurement is performed, and the beliefs about the item features are updated (line 13 of 
Algorithm [T]) , the belief about Aj becomes less certain. Under the assumption that the influence of 
each measurement on the value of information of other measurements is independent of influence 
of any other measurement, the uncertainty is expressed by adding Gaussian noise with variance 
to the belief: 

^!^<^] + r' (5) 

When Aj of measurement rrij is computed, BEL(Aj) becomes exact {c;^ -^0). At the beginning of 
the algorithm, the beliefs about the intrinsic value of information of measurements are computed 
from the initial beliefs about item features. 

In the algorithm that recomputes the value of information selectively, the initial beliefs about the 
intrinsic value of information are computed immediately after line [2] in Algorithm [T| and lines |6]- 
11 of Algorithm [l] are replaced by Algorithm [2] While the number of iterations in lines [7 - 12 of 



Algorithm [2] is the same as in lines [6 -10 of Algorithm [T| Wk is efficiently computable, and the 



subset of measurements for which the value of information is computed in line 15 of Algorithm [2] 
is controlled by the computation cost cy- 



Wk = ^e\ H J -\V^-Vk\^[-'-^ -]-cv (6) 
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Algorithm 2 Rational computation of the value of information 
1: for all measurements mj do 
2: if Cj < budget then 

3: ^ - ?i ^ V^F^ 

4: else 

5: Vj ^ 0; qj ^ 
6: loop 

7: for all measurements do 
8: if Ck < budget then 
9: Compute Wk 

10: else 

11: Wk^Q 
12: ^max 

^ arg max Wk 

13: if VFfc < then 
14: break 

15: Compute Kh \ Vh Az. — Ch \ <^k -^0 

r "Tnax ' "^max /^max ''■max * ^"-max 

16: J max 

^ arg maxj Vj 

17: Compute Aj^^^; Vj^,^ ^ ^j^,^ - Cj^,^] ?in.ax ^ 



where is the highest value of information if any but the highest value of information is 
recomputed, and the next to highest value of information Vp if the highest value of information is 
recomputed; ^{x) is the Gaussian cumulative probability of x for /i = 0, a = 1. 

3.1 Obtaining Uncertainty Parameters 

Uncertainty variance can be learned as a function of the total cost of performed measurements, 
either off-line from earlier runs on the same class of problems, or on-line. Learning r^(c) on-line 
from earlier VOI recomputations proved to be robust and easy to implement: is initialized to 
and gradually updated with each recomputation of the value of information. 

4 EMPIRICAL EVALUATION 

Experiments in this section compare performance of the algorithm that recomputes the value of 
information selectively with the original algorithm in which the value of information of every mea- 
surement is recomputed at every step. Two of the problems evaluated in [Tolpin and Shimony, 2010 
are considered: noisy Ackley function maximization and SVM parameter search. For each of the 
optimization problems, plots of the number of VOI recomputations, the reward, the intrinsic utility, 
and the total cost of measurements are presented. The results are averaged for multiple (100) runs 
of each experiment, such that the standard deviation of the reward is w 5% of the mean reward. 
In the plots, the solid line corresponds to the rationally recomputing algorithm, the dashed line 
corresponds to the original algorithm, and the dotted line corresponds to the algorithm that se- 
lects measurements randomly and performs the same number of measurements as the rationally 
recomputing algorithm for the given computation cost cy- Since, as can be derived from Q, the 
computation time Tr of the rationally recomputing algorithm decreases with the logarithm of the 
computation cost cy, Tr = @{A — Blogcy), the computation cost axis is scaled logarithmically. 
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4.1 The Ackley Function 



The Ackley function [Ackley, 1987| is a popular optimization benchmark. The two-argument form 
of the Ackley function is used in the experiment; the function is defined by the expression ([T]): 



A{x,y) = 20-exp -0.2 



+ exp 



cos(2ttx) + cos{2iTy)\ 



(7) 



In the optimization problem, the utility function is u{z) = tanh(22;), the measurements are normally 
distributed around the true values with variance = 0.5, and the measurement cost is 0.01. There 
are uniform dependencies with cr^ = 0.5 in both directions of the coordinate grid with a step of 0.2 
along each axis. The results for the blinkered scheme [Tolpin and Shimony, 2010 are presented in 
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Figure 1: The Ackley function, blinkered scheme. 
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4.2 SVM Parameter Search 

An SVM (Support Vector Machine) classifier based on the radial basis function has two param- 
eters: C and 7. A combination of C and 7 with high expected classification accuracy should be 
chosen, and an efficient algorithm for determining the optimal values is not known. A trial for a 
combination of parameters determines estimated accuracy of the classifier through cross-validation. 
The SVMGUIDE2 |wei Hsu et al., 2003| dataset is used for the case study. The utility function is 
u{z) = tanh(4(z — 0.5)), the logC and log 7 axes are scaled for uniformity to ranges [1..21] and 
there are uniform dependencies along both axes with cr^ = 0.4. The measurements are normally 
distributed with variance o"^ = 0.25 around the true values, and the measurement cost is Cm = 0.01. 
The results for the myopic scheme are presented in Figure [2] 

4.3 Discussion of Results 



In all experiments, a significant decrease in the computation time is achieved with only slight 
degradation of the reward; performance of the rationally recomputing algorithm decreases slowly 
with the computation cost and exceeds performance of the algorithm that makes random mea- 
surements even when VOI for only a small fraction of measurements is recomputed at each step. 
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Figure 2: SVM parameter search, myopic scheme. 

Exact dependency of performance of the rationally recomputing of algorithm on the intensity of 
VOI recomputations varies among problems and depends both on the problem properties and on 
the VOI estimate used in the algorithm. 

5 CONCLUSION 

The paper proposes an improvement to a widely used class of VOI-based optimization algorithms. 
The improvement allows to decrease the computation time while only slightly affecting the perfor- 
mance. The proposed algorithm rationally reuses computations of VOI and recomputes VOI only 
for measurements for which a change in VOI is likely to affect the choice of the next measurement. 
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